14 research outputs found

    Fast Penalized Regression and Cross Validation for Tall Data with the oem Package

    Get PDF
    A large body of research has focused on theory and computation for variable selection techniques for high dimensional data. There has been substantially less work in the big "tall" data paradigm, where the number of variables may be large, but the number of observations is much larger. The orthogonalizing expectation maximization (OEM) algorithm is one approach for computation of penalized models which excels in the big tall data regime. The oem package is an efficient implementation of the OEM algorithm which provides a multitude of computation routines with a focus on big tall data, such as a function for out-of-memory computation, for large-scale parallel computation of penalized regression models. Furthermore, in this paper we propose a specialized implementation of the OEM algorithm for cross validation, dramatically reducing the computing time for cross validation over a naive implementation

    Subgroup Identification Using the personalized Package

    Get PDF
    A plethora of disparate statistical methods have been proposed for subgroup identification to help tailor treatment decisions for patients. However a majority of them do not have corresponding R packages and the few that do pertain to particular statistical methods or provide little means of evaluating whether meaningful subgroups have been found. Recently, the work of Chen, Tian, Cai, and Yu (2017) unified many of these subgroup identification methods into one general, consistent framework. The goal of the personalized package is to provide a corresponding unified software framework for subgroup identification analyses that provides not only estimation of subgroups, but evaluation of treatment effects within estimated subgroups. The personalized package allows for a variety of subgroup identification methods for many types of outcomes commonly encountered in medical settings. The package is built to incorporate the entire subgroup identification analysis pipeline including propensity score diagnostics, subgroup estimation, analysis of the treatment effects within subgroups, and evaluation of identified subgroups. In this framework, different methods can be accessed with little change in the analysis code. Similarly, new methods can easily be incorporated into the package. Besides familiar statistical models, the package also allows flexible machine learning tools to be leveraged in subgroup identification. Further estimation improvements can be obtained via efficiency augmentation

    Enhancing modified treatment policy effect estimation with weighted energy distance

    Full text link
    The effects of continuous treatments are often characterized through the average dose response function, which is challenging to estimate from observational data due to confounding and positivity violations. Modified treatment policies (MTPs) are an alternative approach that aim to assess the effect of a modification to observed treatment values and work under relaxed assumptions. Estimators for MTPs generally focus on estimating the conditional density of treatment given covariates and using it to construct weights. However, weighting using conditional density models has well-documented challenges. Further, MTPs with larger treatment modifications have stronger confounding and no tools exist to help choose an appropriate modification magnitude. This paper investigates the role of weights for MTPs showing that to control confounding, weights should balance the weighted data to an unobserved hypothetical target population, that can be characterized with observed data. Leveraging this insight, we present a versatile set of tools to enhance estimation for MTPs. We introduce a distance that measures imbalance of covariate distributions under the MTP and use it to develop new weighting methods and tools to aid in the estimation of MTPs. We illustrate our methods through an example studying the effect of mechanical power of ventilation on in-hospital mortality

    Counterfactual fairness for small subgroups

    Full text link
    While methods for measuring and correcting differential performance in risk prediction models have proliferated in recent years, most existing techniques can only be used to assess fairness across relatively large subgroups. The purpose of algorithmic fairness efforts is often to redress discrimination against groups that are both marginalized and small, so this sample size limitation often prevents existing techniques from accomplishing their main aim. We take a three-pronged approach to address the problem of quantifying fairness with small subgroups. First, we propose new estimands built on the "counterfactual fairness" framework that leverage information across groups. Second, we estimate these quantities using a larger volume of data than existing techniques. Finally, we propose a novel data borrowing approach to incorporate "external data" that lacks outcomes and predictions but contains covariate and group membership information. This less stringent requirement on the external data allows for more possibilities for external data sources. We demonstrate practical application of our estimators to a risk prediction model used by a major Midwestern health system during the COVID-19 pandemic

    A reluctant additive model framework for interpretable nonlinear individualized treatment rules

    Full text link
    Individualized treatment rules (ITRs) for treatment recommendation is an important topic for precision medicine as not all beneficial treatments work well for all individuals. Interpretability is a desirable property of ITRs, as it helps practitioners make sense of treatment decisions, yet there is a need for ITRs to be flexible to effectively model complex biomedical data for treatment decision making. Many ITR approaches either focus on linear ITRs, which may perform poorly when true optimal ITRs are nonlinear, or black-box nonlinear ITRs, which may be hard to interpret and can be overly complex. This dilemma indicates a tension between interpretability and accuracy of treatment decisions. Here we propose an additive model-based nonlinear ITR learning method that balances interpretability and flexibility of the ITR. Our approach aims to strike this balance by allowing both linear and nonlinear terms of the covariates in the final ITR. Our approach is parsimonious in that the nonlinear term is included in the final ITR only when it substantially improves the ITR performance. To prevent overfitting, we combine cross-fitting and a specialized information criterion for model selection. Through extensive simulations, we show that our methods are data-adaptive to the degree of nonlinearity and can favorably balance ITR interpretability and flexibility. We further demonstrate the robust performance of our methods with an application to a cancer drug sensitive study

    Causally-interpretable meta-analysis: clearly-defined causal effects and two case studies

    Full text link
    Meta-analysis is commonly used to combine results from multiple clinical trials, but traditional meta-analysis methods do not refer explicitly to a population of individuals to whom the results apply and it is not clear how to use their results to assess a treatment's effect for a population of interest. We describe recently-introduced causally-interpretable meta-analysis methods and apply their treatment effect estimators to two individual-participant data sets. These estimators transport estimated treatment effects from studies in the meta-analysis to a specified target population using individuals' potentially effect-modifying covariates. We consider different regression and weighting methods within this approach and compare the results to traditional aggregated-data meta-analysis methods. In our applications, certain versions of the causally-interpretable methods performed somewhat better than the traditional methods, but the latter generally did well. The causally-interpretable methods offer the most promise when covariates modify treatment effects and our results suggest that traditional methods work well when there is little effect heterogeneity. The causally-interpretable approach gives meta-analysis an appealing theoretical framework by relating an estimator directly to a specific population and lays a solid foundation for future developments.Comment: 31 pages, 2 figures Submitted to Research Synthesis Method

    A method for comparing multiple imputation techniques: A case study on the U.S. national COVID cohort collaborative.

    Get PDF
    Healthcare datasets obtained from Electronic Health Records have proven to be extremely useful for assessing associations between patients’ predictors and outcomes of interest. However, these datasets often suffer from missing values in a high proportion of cases, whose removal may introduce severe bias. Several multiple imputation algorithms have been proposed to attempt to recover the missing information under an assumed missingness mechanism. Each algorithm presents strengths and weaknesses, and there is currently no consensus on which multiple imputation algorithm works best in a given scenario. Furthermore, the selection of each algorithm’s pa- rameters and data-related modeling choices are also both crucial and challenging
    corecore